An efficient virtual network interface in the FUGU scalable workstation dc by Kenneth Martin Mackenzie
نویسنده
چکیده
A scalable workstation is one vision of a mainstream parallel computer: a machine that combines scalable, fine-grain communication facilities for parallel applications with virtual memory and preemptive multiprogramming to support general-purpose workloads. A key challenge in a scalable workstation is the Virtual Network Interface (VNI) problem. The problem is that high performance communication for parallel programming depends on a tight coupling between the application and the network while multiprogramming and virtual memory effects disrupt such coupling. This thesis introduces and evaluates the “direct” virtual network interface: a solution to the VNI problem for fine-grain messages in a scalable workstation. The direct VNI employs two complementary architectural techniques to reconcile speed and protection. First, two-case delivery optimistically provides direct, user-level access to network interface hardware but also transparently backs the direct system with a robust, software-buffered system. Two-case delivery allows the scalable workstation to support both good parallel application performance through the fast hardware interface and good global system performance by permitting buffering when required for multiprogramming. Second, the software-buffered mode uses virtual buffering to provide effectively unlimited buffer capacity by storing messages in dynamically managed virtual memory. Virtual buffering gives the user the convenient illusion of a very large buffer while giving the operating system the means to minimize actual, physical memory consumption. The direct VNI ideas are implemented in an experimental scalable workstation, FUGU, consisting of emulated hardware, a matching simulator and a custom operating system. Results from workloads of real and synthetic applications show that the direct VNI provides high performance because the direct case is both fast and common. Microbenchmarks show the protected direct delivery case costs only 60% (10s of cycles per message) more than unprotected messages on the same hardware. Further, in a mixed workload experiment, we observe that our parallel applications see only 14 33% of messages buffered when 10% of the CPU time is devoted to unrelated, high-priority, interactive tasks. Finally, results show that physical buffering requirements remain naturally low in real applications despite the combination of unacknowledged messages and unlimited buffering. Thesis Supervisor: Anant Agarwal Title: Associate Professor of Computer Science and Engineering Thesis Supervisor: M. Frans Kaashoek Title: Associate Professor of Computer Science and Engineering
منابع مشابه
Implications of I/O for Gang Scheduled Workloads
This paper examines the implications of gang scheduling for generalpurpose multiprocessors. The workloads in these environments include both compute-bound parallel jobs, which often require gang scheduling, and I/O-bound jobs, which require high CPU priority to achieve interactive response times. Our results indicate that an effective interactive multiprocessor scheduler must weigh both the ben...
متن کاملUDM: User Direct Messaging for General-Purpose Multiprocessing
User Direct Messaging (UDM) allows user-level, processor-to-processor messaging to coexist with general multiprogramming and virtual memory. Direct messaging, where processors launch and receive messages in tens of cycles directly via network interface FIFOs as opposed to indirectly via memory, offers high message bandwidth and low delivery latency by avoiding memory delay and buffer management...
متن کاملExploiting Two-Case Delivery for Fast Protected Messaging
We propose and evaluate two complementary techniques to protect and virtualize a tightly-coupled network interface in a multicomputer. The techniques allow efficient, direct application access to network hardware in a multiprogrammed environment while gaining most of the benefits of a memory-basednetwork interface. First, two-case delivery allows an application to receive a message directly fro...
متن کاملFUGU: Implementing Translation and Protection in a Multiuser, Multimodel Multiprocessor
Multimodel multiprocessors provide both shared memory and message passing primitives to the user for efficient communication. In a multiuser machine, translation permits machine resources to be virtualized and protection permits users to be isolated. The challenge in a multiuser multiprocessor is to provide translation and protection sufficient for general-purpose computing without compromising...
متن کاملCongestion estimation of router input ports in Network-on-Chip for efficient virtual allocation
Effective and congestion-aware routing is vital to the performance of network-on-chip. The efficient routing algorithm undoubtedly relies on the considered selection strategy. If the routing function returns a number of more than one permissible output ports, a selection function is exploited to choose the best output port to reduce packets latency. In this paper, we introduce a new selection s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998